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Abstract:  We  describe  our  participation  in  the 
TREC  2008  Enterprise  track  and  detail  our  lan¬ 
guage  modeling-based  approaches.  For  document 
search,  our  focus  was  on  query  expansion  us¬ 
ing  profiles  of  top  ranked  experts  and  on  docu¬ 
ment  priors.  We  found  that  these  techniques  result 
in  small,  but  noticeable  improvements  over  our 
baseline  method.  For  expert  search,  we  combine 
candidate-  and  document-based  models,  and  also 
bring  in  web  evidence.  We  found  that  the  com¬ 
bined  models  significantly  and  consistently  out¬ 
performed  our  very  competitive  baseline  models. 


1  Introduction 

Similarly  to  last  year,  the  TREC  2008  enterprise  track  fea¬ 
tured  two  separate  tasks:  document  search  and  expert  find¬ 
ing.  For  both  tasks,  we  experiment  with  a  query  expansion 
technique  using  profiles  of  top  ranked  experts  and  with  en¬ 
coding  query-independent  features  as  (document  and  can¬ 
didate)  priors.  Further,  concerning  the  expert  search  task 
we  consider  both  candidate-  and  document-based  models, 
as  well  as  their  combination. 

Our  main  findings  are  that  for  document  search  our  at¬ 
tempts  at  query  modeling  and  the  use  of  document  priors 
meet  with  limited  success,  although  noticeable  improve¬ 
ments  in  average  precision  can  be  observed.  For  expert  find¬ 
ing,  we  arrive  at  more  interesting  findings.  First,  in  con¬ 
trast  with  the  literature  and  with  our  previous  studies  [3,  7] 
we  find  that  candidate  models  (introduced  as  “Model  1”  in 
[3])  can  outperform  document-based  models  (a.k.a.  “Model 
2“  from  [3]).  Specifically,  we  compare  a  proximity-based 
version  of  the  candidate-based  model  (“Model  IB”),  com¬ 
plemented  with  a  fine-grained  method  for  estimating  the 
strength  of  the  association  between  documents  and  candi¬ 
dates,  based  on  global  statistics  and  semantic  relatedness 
[2]  with  the  document-based  model  employed  on  top  of  our 
best  performing  document  search  run.  Second,  we  find  that 
a  combination  of  the  two  strategies  (Model  IB  and  Model 
2)  outperforms  both.  Third,  query  modeling,  using  blind 
feedback  both  from  documents  and  experts,  helps  improve 
retrieval  performance.  Fourth,  bringing  in  web  evidence 
boosts  performance  even  further. 


The  paper  is  organized  as  follows.  We  discuss  our  work 
on  the  document  search  task  (Section  2)  and  on  the  expert 
search  task  (Section  3)  in  two  largely  independent  sections. 
We  conclude  our  findings  and  put  forward  suggestions  for 
future  work  in  Section  4. 

2  Document  Search 

The  aim  of  the  document  search  task  is  to  retrieve  documents 
that  help  a  science  communicator  within  an  organization  (in 
this  case  CSIRO)  create  an  overview  page  for  a  given  topi¬ 
cal  area.  Relevant  documents  are  therefore  documents  that 
discuss  the  given  topic  in  detail  and  not  the  ones  that  only 
touch  on  the  topic.  Last  year  the  usual  TREC-style  topic 
definitions  were  expanded  with  a  number  of  examples  of 
key  pages.  These  example  documents  could  then  be  used 
to  construct  rich  query  models  [5,  6],  One  of  our  major  aims 
this  year  is  to  devise  ways  of  constructing  rich  query  mod¬ 
els  when  such  elaborate  specifications  of  information  needs 
are  not  available.  In  addition,  we  experiment  with  using  a 
document  prior. 

2.1  Modeling 

We  employ  a  standard  language  modeling  approach  to  IR 
and  rank  documents  by  their  log-likelihood  of  being  rele¬ 
vant  given  a  query.  Without  presenting  details  here  we  only 
provide  our  final  formula  for  ranking  documents,  and  refer 
the  reader  to  [6]  for  a  derivation  of  this  equation: 

\ogP{D\Q)  -  log P(D)  +  P(r 1 0e)  •  log P(t 1 0D) •  (1) 

Here,  both  documents  and  queries  are  represented  as  multi¬ 
nomial  distributions  over  terms  in  the  vocabulary.  We  esti¬ 
mate  each  document  model  f 0 /> )  by: 

P(t  |0D)  =  ( 1  -  XD)  ■ P(t  | D)  +  XD-P(t),  (2) 

where  P(t\D)  and  Pit)  are  maximum  likelihood  estimates  of 
the  term  t  on  the  document  and  on  the  collection,  respec¬ 
tively,  and  X/)  is  a  smoothing  parameter. 

Next,  we  address  the  estimation  of  the  other  two  compo¬ 
nents  of  our  modeling:  the  query  model  Qq  in  Section  2.1.1 
and  document  priors  P{D)  in  Section  2.1.2. 
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2.1.1  Query  models 

We  consider  constructing  the  query  model  from  three  com¬ 
ponents  according  to  the  following  equation: 

m  0e)  =  V^|9e)  (3) 

+  P'P(r\ 9q) 

+  (l-XQ-ju)-P(r\Q). 

Here,  P(r|0g)  is  estimated  using  relevance  models  (method 
2)  of  Lavrenko  and  Croft  [11],  P(f|0g)  is  constructed  from 
profiles  of  candidate  experts,  and  P(t\Q)  is  the  initial  query. 

Sampling  expansion  terms  from  expert  profiles  is  per¬ 
formed  using  the  following  algorithm.  First,  we  rank  experts 
using  expert  finding  Model  IB  described  in  Section  3.1.1. 
Then,  we  obtain  P(t  |S)  by  taking  terms  from  the  profiles  of 
the  top  ranked  M  experts: 

P{t\S)=  £  P(r|0ra)-P(cfl|S),  (4) 

caeM 


where  P(f|0fa)  is  the  probability  of  term  t  given  the  candi¬ 
date’s  language  model,  and  P(cfl|S)  is  proportional  to  how 
likely  candidate  ca  is  an  expert,  given  the  top  M  experts: 


P(ca\S) 


P{ca\Q ) 

Lca'eMP(ca'\Q)' 


(5) 


Calculating  the  sampling  distribution  P(f|S),  therefore,  can 
be  viewed  as  the  following  generative  process: 

1.  Let  the  set  of  candidate  experts  {ca  £  M}  be  given 

2.  Select  a  candidate  ca  from  this  set  with  probability 
P(ca\S). 

3.  From  this  candidate,  generate  the  term  t  with  probabil¬ 
ity  P{t  |0ca) 

Finally,  we  take  the  top  K  terms  from  Pit  S')  to  form  Pit  | 0g) . 


2.1.2  Document  priors 

Since  we  are  looking  for  key  pages,  our  intuition  is  that  these 
pages  have  shorter  URLs  than  non-key  pages.  This  heuristic 
has  already  proved  useful  for  web  document  search  and  can 
effectively  be  encoded  as  a  document  prior  [9,  10].  We  set 
P(D)  in  Eq.  1  as  follows: 

P{D)  °c  C  -  URL  LENGTH (D) ,  (6) 

where  C  is  a  constant  (here  set  to  255),  and 
UR L  LENG 7 7 /  ( D )  denotes  the  length  of  the  URL  (number 
of  characters)  of  document  D. 


2.2  Runs 

We  submitted  the  runs  listed  below,  all  of  which  were  auto¬ 
matic.  To  estimate  the  parameters  of  our  models,  such  as  the 
number  of  feedback  documents  and  terms,  and  the  interpo¬ 
lation  weights  in  Eq.  3  we  use  the  2007  topic  set. 


UvA08DSbl  the  baseline  run;  uses  only  the  initial  query 
without  expansion  ( Xq  =  fJ  =  0)  and  document  priors 
are  set  to  be  uniform. 

UvA08DSbfb  blind  feedback  run;  query  model  uses  the 
relevance  model  component  ( Xq  =  0.5,  top  10  terms 
from  top  5  documents)  but  not  the  expert  profiles  com¬ 
ponent  (jj  =  0).  Document  priors  are  set  to  be  uniform. 
UvA08DSexp  query  expansion  using  expert  profiles;  same 
as  UvA08DSbfb  but  with  Xq  =  0.4  and  also  using  candi¬ 
date  profiles  for  expansion  (/j  =  0.2,  top  10  terms  from 
top  5  experts).  Document  priors  are  set  to  be  uniform. 
UvA08DSall  all  features;  query  model  is  constructed  as 
in  UvA08DSexp  and  document  priors  are  set  based  on 
URL  character  length. 

For  the  estimation  of  the  document  language  model  f  0/j  )  we 
employ  Bayes  smoothing  with  Dirichlet  priors,  i.e.,  put  Xq  = 
p/(|t/|  +  [))  in  Eq.  2,  and  set  p  to  be  the  average  document 
length  (P  =  260). 

2.3  Results 

Our  results  for  the  2008  document  search  task  are  listed  in 
Table  2.  In  terms  of  infAP,  UvA08DSall  outperforms  the 
other  runs,  but  in  terms  of  infNDCG,  no  run  beats  the  base¬ 
line  run  UvA08DSbl.  For  comparison,  we  have  included  the 
results  of  runs  produced  on  last  year’s  data;  see  Table  1. 
Although  the  official  metrics  used  in  2007  were  different 
from  those  used  in  2008,  we  can  observe  similar  patterns: 
UvA07DSall  beating  the  other  approaches  on  all  metrics  ex¬ 
cept  MRR,  where  the  baseline  beats  the  other  approaches. 

MAP  P5  PlO  P20  MRR 
UvA07DSbl  .3853  .6520  .5940  .4870  .8675 
UvA07DSbfb  .3953  .6560  .6100  .4930  .8030 
UvA07DSexp  .4002  .6640  .6040  .4920  .7981 

UvA07DSall  .4056  .6800  .6140  .4930  .8098 


Table  1:  Results  for  the  document  search  task:  2007  topic 
set.  Best  scores  for  each  metric  are  in  boldface. 


Run 

infAP 

infNDCG 

UvA08DSbl 

.3103 

.4938 

UvA08DSbfb 

.3209 

.4889 

UvA08DSexp 

.3242 

.4854 

UvA08DSall 

.3306 

.4909 

Table  2:  Results  for  the  document  search  task:  2008  topic 
set.  Best  scores  for  each  metric  are  in  boldface. 


3  Expert  Search 

For  the  expert  search  task,  our  aim  was  to  experiment  with  a 
proximity-based  version  of  the  candidate  model  that  we  have 


introduced  before  [2],  to  combine  it  with  document -based 
models,  to  determine  the  effectiveness  of  query  modeling, 
and  to  bring  in  web  evidence. 

3.1  Modeling 

Our  approach  to  ranking  candidates  is  as  follows: 

P(ca\Q)  °c  P(ca)  -P(Q\ca),  (7) 

where  P{ca )  is  the  a  priori  probability  of  the  candidate  ca 
being  an  expert,  and  P(Q\ca)  is  the  probability  of  ca  gener¬ 
ating  the  query  Q.  Our  choice  of  setting  P[ca)  is  presented  in 
Section  3.1.3.  For  estimating  P(Q\ca)  we  consider  both  can¬ 
didate  (Section  3.1.1)  and  document  (Section  3.1.2)  models. 


3.1.3  Candidate  priors 

We  use  candidate  priors  to  filter  out  science  communicators 
(SC)  (often  called  communication  officer/manager/advisor 
or  manager  public  affairs  communication).  Following  [2], 
we  first  extracted  names  and  positions  from  contact  boxes  of 
CSIRO  pages.  Then,  SCs  were  assigned  the  value  0,  while 
all  other  people  were  assigned  the  value  1  as  a  candidate 
prior: 


P(ca) 


1 ,  ca  ^  SC, 
0,  ca  €  SC. 


(11) 


3.1.4  Runs 


3.1.1  Candidate  model  (Model  IB) 

We  use  a  proximity-based  version  of  the  candidate  model, 
referred  to  as  Model  IB  [7].  Here,  a  language  model  Qca  is 
inferred  for  each  candidate  and  the  log-query-likelihood  of  a 
candidate  producing  the  query  is  obtained  as  follows: 

\ogP{Q\ca)=  £P(f|0e)-logP(t|Ora),  (8) 

teQ 

where  P(r|0ra)  is  a  linear  interpolation  between  an  empirical 
candidate  model  (P(t\ca))  and  the  background  (collection) 
language  model  ( P(t )): 

P(t\Qca)  =  {1-Ka)-P{t\ca)+Ka-P{t).  (9) 

The  probability  P(t\ca)  is  estimated  based  on  the  co- 
occurrance  of  the  term  t  and  candidate  ca  in  a  particular 
window  size  w  (which  was  set  to  125  based  on  empirical  ex¬ 
ploration).  The  model  we  use  corresponds  to  Model  IB  with 
semantic  document-candidate  associations  (SEM)  described 
in  [6], 

Recent  work  on  expertise  retrieval  has  indicated  the  use¬ 
fulness  of  web  evidence  [8,  12].  In  these  studies  Model  2  is 
applied  on  top  of  search  engine  results  (either  snippets  or  full 
documents).  We  also  used  web  evidence,  but  in  a  candidate- 
based  fashion.  A  web-based  variation  of  Model  IB  was  em¬ 
ployed,  where  the  candidate’s  name  was  used  as  a  query, 
issued  to  a  web  search  engine  API  (in  our  case:  Yahoo!). 
Then,  text  from  the  top  100  result  snippets  was  used  to  con¬ 
struct  P(t\ca). 

3.1.2  Document  model  (Model  2) 

Using  a  document-based  model  the  estimation  of  P(Q\ca)  is 
goes  as  follows: 

P{Q\ca)  =  YJP{Q\D)-P{D\ca).  (10) 

D 

We  use  the  approach  developed  for  ranking  documents  to 
estimate  P(Q\D)  (see  Section  2.1).  As  to  P(D\ca ),  we  use 
the  semantic  relatedness  of  document  D  and  candidate  ca 
(the  same  settings  that  for  the  candidate  model);  see  [1,  Sec¬ 
tion  6.3.5]  for  details. 


We  submitted  the  following  4  runs: 

UvA08ESmlb  Model  IB  using  the  initial  query  (without 
expansion). 

UvA08ESm2all  Model  2  using  expanded  query  models 
and  all  document  search  features  (on  top  of  document 
search  run  UvA08DSall) 

UvA08EScomb  linear  combination  of  Model  IB  (with 
weight  0.7)  and  Model  2  (with  weight  0.3).  Both  mod¬ 
els  use  the  initial  query  (without  expansion). 
UvA08ESweb  linear  combination  of  the  run  UvA08EScomb 
(with  weight  0.75)  and  the  Web-based  variation  of 
Model  IB  (with  weight  0.25).  The  web  run  uses  the 
query  model  from  UvA08DSexp. 

We  employed  candidate  priors  as  described  in  Section  3.1.3 
for  all  runs. 

3.2  Results 

Table  4  shows  that  the  most  successful  strategy  is  to  put  ev¬ 
erything  together:  UvA08ESweb  outperforms  our  other  runs. 
Interestingly,  Model  IB  outperforms  Model  2;  note  that  the 
run  labeled  UvA08ESmlb  does  not  employ  query  expansion, 
while  UvA08ESm2all  uses  features  that  improved  perfor¬ 
mance  on  the  document  search  task  (see  Section  2.3),  in¬ 
cluding  query  expansion.  Furthermore,  we  see  that  a  com¬ 
bination  of  the  two  methods  outperforms  both  models  on  all 
metrics.  And  finally,  bringing  in  web  evidence  helps  im¬ 
prove  retrieval  comparison  even  further  (see  the  run  labeled 
UvA08ESwb).  Looking  at  the  corresponding  scores  on  the 
2007  topic  set  (Table  3),  we  observe  very  similar  behavior. 


Run 

#rel_ret 

MAP 

P@5  P@10 

MRR 

UvA07ESmlb 

124 

.4838 

.2800  .1740 

.6334 

UvA07ESm2all 

126 

.4799 

.2600  .1800 

.6268 

UvA07EScomb 

121 

.5267 

.2880  .1820 

.6828 

UvA07ESweb 

122 

.5405 

.3080  .1780 

.6468 

Table  3:  Results  for  the  Expert  Search  task:  2007  topic  set. 
Best  scores  for  each  metric  are  in  boldface. 


Run 

#rel_ret 

MAP 

P@5 

P@10 

MRR 

UvA08ESmlb 

394 

.3935 

.4836 

.3473 

.8223 

UvA08ESm2all 

395 

.3679 

.4473 

.3436 

.6831 

UvA08EScomb 

419 

.4331 

.4982 

.3836 

.8547 

UvA08ESweb 

425 

.4490 

.5527 

.3982 

.8721 

Table  4:  Results  for  the  Expert  Search  task:  2008  topic  set. 
Best  scores  for  each  metric  are  in  boldface. 

4  Conclusions 

We  described  our  participation  in  the  TREC  2008  Enterprise 
track.  Building  on  our  earlier  work  [1-7],  we  employed  a 
standard  language  modeling  setting  for  both  the  document 
and  expert  tasks.  Our  aim  for  the  document  search  task  was 
to  experiment  with  query  expansions  and  with  document  pri¬ 
ors.  While  we  observed  improvements,  our  overall  conclu¬ 
sion  is  that  these  techniques  resulted  in  limited  success. 

As  to  the  expert  search  task,  our  experiments  concerned 
the  combination  of  candidate-  and  document-based  methods, 
and  bringing  in  web  evidence.  We  found  that  these  models 
captured  different  experts,  and  therefore,  combining  them 
resulted  in  substantial  improvements  for  all  metrics. 

These  results  suggest  that  possible  improvements  might  be 
pursued  in  the  combination  of  methods,  as  well  as  in  further 
use  of  web  evidence. 
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